MatchZoo: A Toolkit for Deep Text Matching

نویسندگان

  • Yixing Fan
  • Liang Pang
  • Jianpeng Hou
  • Jiafeng Guo
  • Yanyan Lan
  • Xueqi Cheng
چکیده

In recent years, deep neural models have been widely adopted for text matching tasks, such as question answering and information retrieval, showing improved performance as compared with previous methods. In this paper, we introduce the MatchZoo toolkit that aims to facilitate the designing, comparing and sharing of deep text matching models. Speci€cally, the toolkit provides a uni€ed data preparation module for di‚erent text matching problems, a ƒexible layer-based model construction process, and a variety of training objectives and evaluation metrics. In addition, the toolkit has implemented two schools of representative deep text matching models, namely representation-focused models and interactionfocused models. Finally, users can easily modify existing models, create and share their own models for text matching in MatchZoo.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Fusion LSTMs for Text Semantic Matching

Recently, there is rising interest in modelling the interactions of text pair with deep neural networks. In this paper, we propose a model of deep fusion LSTMs (DF-LSTMs) to model the strong interaction of text pair in a recursive matching way. Specifically, DF-LSTMs consist of two interdependent LSTMs, each of which models a sequence under the influence of another. We also use external memory ...

متن کامل

Semantic Annotation, Analysis and Comparison: A Multilingual and Cross-lingual Text Analytics Toolkit

Within the context of globalization, multilinguality and cross-linguality for information access have emerged as issues of major interest. In order to achieve the goal that users from all countries have access to the same information, there is an impending need for systems that can help in overcoming language barriers by facilitating multilingual and cross-lingual access to data. In this paper,...

متن کامل

The Joy of Parallelism with CzEng 1.0

CzEng 1.0 is an updated release of our Czech-English parallel corpus, freely available for non-commercial research or educational purposes. In this release, we approximately doubled the corpus size, reaching 15 million sentence pairs (about 200 million tokens per language). More importantly, we carefully filtered the data to reduce the amount of non-matching sentence pairs. CzEng 1.0 is automat...

متن کامل

MultiGranCNN: An Architecture for General Matching of Text Chunks on Multiple Levels of Granularity

We present MultiGranCNN, a general deep learning architecture for matching text chunks. MultiGranCNN supports multigranular comparability of representations: shorter sequences in one chunk can be directly compared to longer sequences in the other chunk. MultiGranCNN also contains a flexible and modularized match feature component that is easily adaptable to different types of chunk matching. We...

متن کامل

Familia: An Open-Source Toolkit for Industrial Topic Modeling

Familia is an open-source toolkit for pragmatic topic modeling in industry. Familia abstracts the utilities of topic modeling in industry as two paradigms: semantic representation and semantic matching. Efficient implementations of the two paradigms are made publicly available for the first time. Furthermore, we provide off-the-shelf topic models trained on large-scale industrial corpora, inclu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1707.07270  شماره 

صفحات  -

تاریخ انتشار 2017